Java Threads & Concurrency: Understanding OS-Level Implementation
Table of Contents
- Thread Fundamentals
- OS-Level Thread Management
- CPU and Thread Execution
- Memory and RAM Interaction
- Practical Examples
Thread Fundamentals
What is a Thread?
A thread is the smallest unit of execution that can be scheduled by the operating system. Think of it as a lightweight process that shares memory with other threads in the same process.
Key Concept: When you create a Java thread, you're actually requesting the OS to create a native thread.
// Simple thread creation
Thread thread = new Thread(() -> {
System.out.println("Running in: " + Thread.currentThread().getName());
});
thread.start(); // This triggers OS-level thread creation
OS-Level Thread Management
The Journey from Java to OS
When you call thread.start() in Java, here's what happens underneath:
Java Application (JVM)
↓
JVM Thread API
↓
Native Thread Library (pthreads on Linux, Windows Threads on Windows)
↓
Operating System Kernel
↓
Scheduler assigns thread to CPU core
Thread Models
1:1 Model (Java uses this)
- One Java thread = One OS thread
- Each Java thread maps directly to a kernel thread
public class ThreadMappingExample {
public static void main(String[] args) {
// Creating 3 Java threads = 3 OS threads
for (int i = 0; i < 3; i++) {
Thread t = new Thread(() -> {
System.out.println("Thread ID: " + Thread.currentThread().threadId());
System.out.println("Native thread ID: " +
ProcessHandle.current().pid());
});
t.start();
}
}
}
CPU and Thread Execution
How CPU Executes Threads
Single Core CPU:
Time Slice 1: Thread A executes
Time Slice 2: Thread B executes (context switch)
Time Slice 3: Thread A executes (context switch)
Time Slice 4: Thread C executes (context switch)
Multi-Core CPU:
Core 1: Thread A
Core 2: Thread B } All execute simultaneously
Core 3: Thread C
Core 4: Thread D
Context Switching
When the OS switches from one thread to another, it must:
- Save current thread state (registers, program counter, stack pointer) → RAM
- Load next thread state from RAM → CPU registers
- Resume execution
public class ContextSwitchExample {
public static void main(String[] args) {
// With 1000 threads on 8 cores, expect lots of context switching
for (int i = 0; i < 1000; i++) {
new Thread(() -> {
// CPU time slicing happens here
for (int j = 0; j < 1000000; j++) {
Math.sqrt(j); // CPU-intensive work
}
}).start();
}
}
}
Cost of Context Switching:
- Save/restore CPU registers: ~1-2 microseconds
- Cache invalidation (CPU cache needs to reload data)
- TLB (Translation Lookaside Buffer) flush
Memory and RAM Interaction
Thread Memory Layout
Each thread has:
┌─────────────────────────────────┐
│ PROCESS MEMORY SPACE │
├─────────────────────────────────┤
│ Heap (Shared by all threads) │ ← Objects created with 'new'
├─────────────────────────────────┤
│ Method Area (Shared) │ ← Class metadata, static variables
├─────────────────────────────────┤
│ Thread 1 Stack (Private) │ ← Local variables, method calls
├─────────────────────────────────┤
│ Thread 2 Stack (Private) │
├─────────────────────────────────┤
│ Thread 3 Stack (Private) │
└─────────────────────────────────┘
Memory Visibility Problem
public class MemoryVisibilityExample {
// Without volatile, changes might not be visible across threads
private static boolean flag = false;
public static void main(String[] args) throws InterruptedException {
// Thread 1: Reads flag
Thread reader = new Thread(() -> {
while (!flag) {
// CPU might cache 'flag' value in register
// Never reads updated value from RAM!
}
System.out.println("Flag is now true!");
});
// Thread 2: Writes flag
Thread writer = new Thread(() -> {
try {
Thread.sleep(1000);
flag = true; // Written to CPU cache, maybe not RAM yet
System.out.println("Flag set to true");
} catch (InterruptedException e) {
e.printStackTrace();
}
});
reader.start();
writer.start();
}
}
CPU Cache and Memory Hierarchy
CPU Core
├─ L1 Cache (32-64 KB, ~1 ns access)
├─ L2 Cache (256 KB, ~3 ns access)
└─ L3 Cache (Shared, 8-32 MB, ~12 ns access)
↓
Main RAM (GB, ~100 ns access)
Why This Matters:
public class CacheCoherenceExample {
private static int sharedCounter = 0;
public static void main(String[] args) throws InterruptedException {
Thread t1 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
// Core 1 reads sharedCounter into its cache
// Increments it
// Writes back (eventually)
sharedCounter++;
}
});
Thread t2 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
// Core 2 also reads sharedCounter into its cache
// Both cores have different cached values!
sharedCounter++;
}
});
t1.start();
t2.start();
t1.join();
t2.join();
// Expected: 200000, Actual: Less (lost updates)
System.out.println("Counter: " + sharedCounter);
}
}
Practical Examples
Example 1: CPU-Bound Task
public class CPUBoundExample {
public static void main(String[] args) {
int cores = Runtime.getRuntime().availableProcessors();
System.out.println("CPU Cores: " + cores);
// Creating threads = number of cores is optimal for CPU-bound tasks
for (int i = 0; i < cores; i++) {
Thread t = new Thread(() -> {
// This thread gets a dedicated core
long sum = 0;
for (long j = 0; j < 1_000_000_000L; j++) {
sum += j;
}
System.out.println("Sum: " + sum);
});
t.start();
}
}
}
What Happens:
- JVM creates 8 threads (on 8-core CPU)
- OS scheduler assigns 1 thread per core
- Each core executes its thread with minimal context switching
- CPU utilization: ~100%
Example 2: I/O-Bound Task
public class IOBoundExample {
public static void main(String[] args) {
// I/O-bound: Can create many more threads than cores
for (int i = 0; i < 1000; i++) {
Thread t = new Thread(() -> {
try {
// Thread blocks, OS removes from CPU
Thread.sleep(1000); // Simulates I/O wait
// Thread wakes, OS schedules back to CPU
System.out.println("Done waiting");
} catch (InterruptedException e) {
e.printStackTrace();
}
});
t.start();
}
}
}
What Happens:
- Thread calls
sleep()→ moves to WAITING state - OS removes thread from CPU scheduler
- CPU is free for other threads
- After sleep, thread moves to RUNNABLE → OS schedules it back
Example 3: Proper Synchronization
public class SynchronizedExample {
private static int counter = 0;
private static final Object lock = new Object();
public static void main(String[] args) throws InterruptedException {
Thread t1 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
synchronized (lock) {
// CPU acquires lock (atomic operation at hardware level)
// Memory barrier: flushes CPU cache to RAM
counter++;
// Memory barrier: ensures write is visible
// CPU releases lock
}
}
});
Thread t2 = new Thread(() -> {
for (int i = 0; i < 100000; i++) {
synchronized (lock) {
counter++;
}
}
});
t1.start();
t2.start();
t1.join();
t2.join();
System.out.println("Counter: " + counter); // Always 200000
}
}
OS-Level Operations:
- Thread requests lock → OS/JVM checks lock status
- If locked: Thread goes to BLOCKED state (not using CPU)
- Lock owner releases → OS wakes waiting thread
synchronizedcreates memory barriers (CPU instruction)- Cache coherence protocol ensures all cores see updates
Example 4: Understanding Thread States
public class ThreadStatesExample {
public static void main(String[] args) throws InterruptedException {
Object lock = new Object();
Thread t = new Thread(() -> {
synchronized (lock) {
try {
System.out.println("RUNNABLE -> CPU executing");
Thread.sleep(1000);
System.out.println("TIMED_WAITING -> Off CPU, in RAM");
} catch (InterruptedException e) {
e.printStackTrace();
}
}
});
System.out.println("NEW: " + t.getState()); // Thread object in heap
t.start();
System.out.println("RUNNABLE: " + t.getState()); // OS scheduled
Thread.sleep(500);
System.out.println("TIMED_WAITING: " + t.getState()); // Off CPU
t.join();
System.out.println("TERMINATED: " + t.getState()); // OS cleaned up
}
}
Thread Lifecycle at OS Level
NEW (Java object in heap)
↓ start()
RUNNABLE (OS ready queue)
↓ OS Scheduler
RUNNING (Executing on CPU core)
↓ sleep()/wait()/I/O
WAITING/TIMED_WAITING (Off CPU, in RAM)
↓ notify()/interrupt()/I/O complete
RUNNABLE (Back to OS ready queue)
↓ Execution completes
TERMINATED (OS cleans up resources)
Key Takeaways
- Java Thread = OS Thread: Direct 1:1 mapping with native threads
- Context Switching: Expensive operation, save/restore state from RAM
- CPU Cores: Limit true parallelism (8 cores = max 8 threads running simultaneously)
- Memory Visibility: Changes in one core's cache might not be visible to others without synchronization
- Thread Stack: Each thread gets private stack space in RAM (~1 MB default)
- Shared Heap: All threads share heap memory, need synchronization
- OS Scheduler: Decides which thread runs on which core and when
Understanding these concepts helps you write efficient concurrent programs and debug threading issues!